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Abstract 

We consider the PC algorithm for estimating a Markov equivalence class of directed acyclic 
graphs. This algorithm is known to be order-dependent, in the sense that the output de- 
pends on the order in which the variables are given. We show that this order-dependence is 
not just an aesthetical problem, and can lead to highly variable results in high-dimensional 
settings. We propose a simple modification, called PC-stable, that yields order- independent 
adjacencies in its output graph and is consistent in high-dimensional settings under the same 
conditions as the original PC algorithm. We compare the PC and PC-stable algorithms in 
simulation studies and on a yeast gene expression data set, and show that PC-stable yields 
significantly improved performance in high-dimensional settings. The modification used in 
PC-stable can be easily combined with other adaptations of the PC algorithm, including 
hybrid algorithms and the FCI algorithm. All software is implemented in the R-package 
pcalg. 
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1. Introduction 



We consider the PC algorithm (Spirtes et al. 2000) for learning Markov equivalence classes 



of directed acyclic graphs (DAGs). The PC algorithm is a so-called constraint-based 
method, as it relies on conditional independence constraints. Alternative approaches in- 



elude 


score-based and Bayesian methods (e.g., Chickering 


(2002); Cooper and Herskovits 


(1992 


); 


Heckerman et al. (1995); Spiegelhalter et al. (1993 


)), as well as hybrids of the dif- 


ferent approaches (e.g., Dash and Druzdzel (1999); Singh and Valtorta (1993); Spirtes and 


Meek 


( 


1995); Tsamardinos et al. (2006); van Dijk et al. (2003)). 



A Markov equivalence class of DAGs can be uniquely described by a so-called CPDAG, 
which is a graph consisting of directed and/or undirected edges (see Section [2] for a precise 
definition). The PC algorithm is widely used for learning CPDAGs in high-dimensional 
settings (e.g ., |Kalisch et al.| ( |2010[ ); |Nagarajan et al.| ( [2010] ); |Stekhoven et al.| ( [20121 ); |Zhang" 
et al. ( |2011 )). An important reason for this is that PC algorithm is computationally feasible 
for sparse graphs with up to thousands of variables, and open-source software is available 
(e.g., pcalg (|Kalisch et aT] |2012j) and TETRAD IV ([Spirtes et aTj|2000|)). Moreover, the 
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PC algorithm has been shown to be consistent for high-dimensional sparse graphs under 
some conditions (Harris and Drton, 2012 Kalisch and Biihlmann, 2007). 

When the PC algorithm is applied to data, it is generally order-dependent, in the sense 
that its output depends on the order in which the variables are given. Spirtes et al. (2000) 
(Section 5.4.2.4) exploit this order-dependence to obtain faster algorithms, by removing 
the "weakest" edges as early as possible. Dash and Druzdzel (1999) exploit the order- 



dependence for a different purpose, namely to obtain candidate graphs for a score-based 
approach. Cano et al. (2008) resolve the order-dependence via a rather involved method 
based on measuring edge strengths. Overall, however, the order-dependence of the PC algo- 
rithm has received relatively little attention in the literature, suggesting that it seems to be 
regarded as a minor issue. We found, however, that the order-dependence can become very 
problematic for high-dimensional data, leading to highly variable results and conclusions 
for different orderings on the variables. 

In particular, we analyzed a yeast gene expression data set (Hughes et al. (2000); see 
Section [6] for more detailed information) containing gene expression levels of 5361 genes for 
63 wild- type yeast organisms. First, we considered estimating the skeleton of the CPDAG, 
that is, the undirected graph that results by disregarding all arrowheads in the CPDAG. 
Figure |l(a) shows the large variability in the estimated skeletons when we use 25 random 
orderings of the variables. Each estimated skeleton consists of roughly 5000 edges which 
can be roughly divided into three groups: about 1500 are highly stable and occurred in all 
orderings, about 1500 are moderately stable and occurred in at least 50% of the orderings, 
and about 2000 are unstable and occurred in at most 50% of the orderings. 

An important motivation for learning DAGs lies in their causal interpretation. We 
therefore also investigated the effect of different variable orderings on causal inference that 
is based on the PC algorithm. In particular, we applied the IDA algorithm (Maathuis 



et al. 2009, 2010) to the yeast gene expression data discussed above. The IDA algorithm 



conceptually consists of two-steps: first one estimates the Markov equivalence class of DAGs 
using the PC algorithm, and then one applies Pearl's do-calculus (Pearl, 2000) to each DAG 
in the Markov equivalence class. (The algorithm uses a fast local implementation that does 
not require listing all DAGs in the equivalence class.) One can then obtain estimated 
lower bounds on the sizes of the causal effects between all pairs of genes. For each of 
the 25 random orderings, we ranked the gene pairs according to these lower bounds, and 
compared these rankings to a gold standard set of large causal effects computed from gene 
knock-out data. Figure l(b)| shows the large variability in the resulting receiver operating 
characteristic (ROC) curves. The ROC curve that was published in Maathuis et al. (2010) 
was significantly better than random guessing with p < 0.001, and is somewhere in the 
middle. Some of the other curves are much better, while there are also curves that are 
indistinguishable from random guessing. 

The order-dependence in the PC algorithm occurs in two different forms: order-dependence 
in the skeleton and order-dependence in the edge orientations. Figure 1(a) illustrates that 
order-dependence in the skeleton can be very large. In this paper, we propose a slight 
modification in the estimation of the skeleton, which removes this order-dependence. 

The remainder of the paper is organized as follows. In Section [2] we discuss some back- 
ground and terminology. Section [3] explains the original PC algorithm. Section [4] introduces 
our modification, called PC-stable, yielding order-independent skeletons. PC-stable is iden- 
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(a) Edges (black) occurring in the esti- 
mated skeletons for 25 random variable 
orderings, as well as for the original or- 
dering (shown as permutation 26). The 
edges along the a;-axis are ordered accord- 
ing to their presence in the output for the 
26 permutations, from edges that occurred 
in all permutations to edges that occurred 
in only one permutation. Edges that did 
not occur in any of the permutations were 
omitted. For technical reasons, only every 
10th edge is actually plotted. 
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(b) ROC curves corresponding to the 25 ran- 
dom orderings of the variables (solid black), 
where the curves are generated exactly as in 
|Maathuis et~aT| pOlO) . The ROC curve for 
the original ordering of the variables (dashed 
blue) was published in Maathuis et al. ( 2010 1. 
The dashed-dotted red curve represents ran- 
dom guessing. 



Figure 1: Analysis of the yeast gene expression data (Hughes et al 



orderings of the variables, using tuning parameter a = 0.01. 
graphs and resulting causal rankings are highly order-dependent. 



2000[ ) for 25 random 
The estimated 



tical to the original PC algorithm when perfect conditional independence information is 
used. When applied to data, PC-stable is consistent in high-dimensional settings under the 
same conditions as the original PC algorithm. Section [5] compares the PC and PC-stable 
algorithms in simulations, and Section [6] compares them on the yeast gene expression data 
discussed above. We show that PC-stable has significantly improved performance for sparse 
high- dimensional graphs. We close with a discussion in Section [7} 

2. Preliminaries 

2.1 Graph terminology 

A graph Q = (V, E) consists of a set of vertices V = {Xi , . . . , X p } and a set of edges E. 
The vertices represent random variables and the edges describe conditional independence 
and causal relationships. 

A graph containing only directed edges (— >) is called directed, one containing only undi- 
rected edges (— ) is called undirected, and one containing directed and/or undirected edges is 
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called partially directed. The skeleton of a partially directed graph is the undirected graph 
that results when all directed edges are replaced by undirected edges. 

All graphs we consider are simple, meaning that there is at most one edge between any 
pair of vertices. If an edge is present, the vertices are said to be adjacent. If all pairs of 
vertices in a graph are adjacent, the graph is called complete. The adjacency set of a vertex 
Xj in a graph Q = (V,E), denoted by adj(£/,Xj), is the set of all vertices in V that are 
adjacent to Xj in Q. A vertex Xj in adj(C?,Xj) is called a parent of Xi if Xj — > Xi. The 
corresponding set of parents is denoted by pa(£/,Xj). 

A path is a sequence of distinct adjacent vertices. A directed path is a path along directed 
edges that follows the direction of the arrowheads. A directed cycle is formed by a directed 
path from Xi to Xj together with the edge Xj — > Xi. A (partially) directed graph is called 
a (partially) directed acyclic graph ((P)DAG) if it does not contain directed cycles. 

A triple (Xj, Xj, X^) is called unshielded if Xi and Xj as well as Xj and X/% are adjacent, 
but Xi and X& are not adjacent. A v-structure (Xi, Xj, Xk) is an unshielded triple where 
the edges as oriented as Xi — > Xj <— X}.. 



2.2 Probabilistic and causal interpretation of DAGs 



We use the notation Xi _LL Xj|S to indicate that Xj is independent of Xj given S, where 
S is a set of variables not containing Xj and Xj (Dawid, 1980). If S is the empty set, we 
simply write Xj JL Xj. If Xj JL Xj|S, we refer to S as a separating set for (Xj,Xj). A 
separating set S for (Xj, Xj) is called minimal if there is no proper subset S' of S such that 
XJLXjIS'. 

A distribution Q is said to factorize according to a DAG Q = (V, E) if the the joint 
density of V = (Xi, . . . ,X p ) can be written as the product of the conditional densities of 
each variable given its parents in Q: q(X\, . . . , X p ) = fjf =1 q(Xi\pa(Q, Xj)). 

A DAG entails conditional independence relationships via a graphical criterion called 
d- separation (Pearl, 2000). If two vertices Xj and Xj in a DAG Q are d-separated by a 



subset S of the remaining vertices, then Xj JL Xj|S in any distribution Q that factorizes 
according to Q. A distribution Q is said to be faithful to a DAG Q if the reverse implication 
also holds, that is, if the conditional independence relationships in Q are exactly the same 
as those that can be inferred from Q using d-separation. 

Several DAGs can describe exactly the same conditional independence information. Such 
DAGs are called Markov equivalent and form a Markov equivalence class. Markov equivalent 
DAGs have the same skeleton and the same v-structures, and a Markov equivalence class can 



be described uniquely by a completed partially directed acyclic graph (CPDAG) ( Andersson 



et al. 


1997, 


Chickering 


2002) 



directed edge exists in every DAG in the Markov equivalence class, and for every undirected 



edge Xj — Xj there exists a DAG with Xj 



Xj and a DAG with Xj 



X,- in the Markov 



equivalence class. A CPDAG C is said to represent a DAG Q if Q belongs to the Markov 
equivalence class described by C. 



A DAG can be interpreted causally in the following way ( Pearl , 2000 , 2009 , Spirtes et al. 
2000): Xi is a direct cause of X2 only if X\ — > X2, and X\ is a possibly indirect cause of 



X2 only if there is a directed path from X\ to X2. 
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Under the assumptions of faithfulness and causal sufficiency (no hidden confounders 
and no hidden selection variables), the PC algorithm can be used to learn causal CPDAGs 



from conditional independence information. The FCI algorithm Spirtes et al. (2000J) is 



designed for the analogous problem without causal sufficiency. Several adaptations of the 



FCI algorithm, including the fast RFCI algorithm, can be found in Colombo et al. (2012). 



3. The PC algorithm 



We now describe the PC algorithm in detail. In Section 3.1 we first discuss the algorithm 
under the assumption that we have perfect conditional independence information between 
all variables in V. We refer to this as the oracle version. In Section [3.21 we discuss the more 
realistic situation where conditional independence relationships have to be estimated from 
data. We refer to this as the sample version. 

3.1 Oracle version 



A sketch of the PC algorithm is given in Algorithm 3.1 We see that the algorithm consists 
of three steps. Step 1 finds the skeleton and separation sets, while Steps 2 and 3 determine 
the orientations of the edges. 

Algorithm 3.1 The PC algorithm (oracle version) 

Require: Conditional independence information among all variables in V, and an ordering 
order(V) on the variables 



Find the skeleton and separation sets using Algorithm 3.2 

Orient unshielded triples in the skeleton based on the separation sets; 

Orient as many of the remaining undirected edges as possible by repeated application 

of rules R1-R3 (see text); 

return Output graph (C) and separation sets (sepset). 



Step 1 is given in pseudocode in Algorithm |3.2| We start with a complete undirected 
graph C. This graph is subsequently thinned out in the loop on lines 3-15 in Algorithm 
3.2, where an edge Xi — Xj is deleted if Xi _LL Xj\S for some subset S of the remaining 



variables. These conditional independence queries are organized in a way that makes the 
algorithm computationally efficient for high-dimensional sparse graphs. 

First, when £ = 0, all pairs of vertices are tested for marginal independence. If Xi JL Xj, 
then the edge Xi—Xj is deleted and the empty set is saved as separation set in sepset (Xj, Xj) 
and sepset ( Xj, Xi). After all pairs of vertices have been considered (and many edges might 
have been deleted), the algorithm proceeds to the next step with i = 1. 

When i = 1, the algorithm chooses an ordered pair of vertices (Xi,Xj) still adjacent in 
C, and checks Xi _LL Xj\S for subsets S of size t = 1 of adj(C, Xi) \ {Xj} and of adj(C, Xj) \ 
{Xi}. If such a conditional independence is found, the edge Xi — Xj is removed, and the 
corresponding conditioning set S is saved in sepset(Xj, Xj) and sepset ( Xj, Xi). If all pairs 
of adjacent vertices have been considered for conditional independence given all subsets of 
size I of their adjacency sets, the algorithm again increases £ by one. This process continues 
until all adjacency sets in the current graph are smaller than £. At this point the skeleton 
and the separation sets have been determined. 
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Algorithm 3.2 Step 1 of the PC algorithm (oracle version) 

Require: Conditional independence information among all variables in V, and an ordering 
order(V) on the variables 
1: Form the complete undirected graph C on the vertex set V 
2: Let £ = -1; 
3: repeat 
4: Let£ = £ + 1; 
5: repeat 

6: Select a (new) ordered pair of vertices (Xi,Xj) that are adjacent in C and satisfy 

|adj(C,Xi) \ {Xj}\ > £, using order(V); 
7: repeat 

8: Choose a (new) set S C adj(C,Xj) \ {Xj} with |S| = £, using order(V); 

9: if Xi and Xj are conditionally independent given S then 

10: Delete edge Xi — Xj from C; 

11: Let sepset (Xj, Xj) = sepset (Xj, Xi) = S; 

12: end if 

13: until Xi and Xj are no longer adjacent in C or all S C adj(C,Xj) \ {Xj} with 

|S| = £ have been considered 
14: until all ordered pairs of adjacent vertices (Xi,Xj) in C with |adj(C,Xj) \ {Xj}| > £ 

have been considered 
15: until all pairs of adjacent vertices (Xj, Xj) in C satisfy |adj(C, Xi) \ {Xj}\ < £ 
16: return C, sepset. 



Step 2 determines the v-structures. In particular, it considers all unshielded triples 
in C, and orients an unshielded triple (Xj,Xj,Xfc) as a v-structure if and only if Xj ^ 
sepset (Xi,X k ). 

Finally, Step 3 orients as many of the remaining undirected edges as possible by repeated 
application of the following three rules: 

Rl: orient Xj — X k into Xj —> X k whenever there is a directed edge Xi — > Xj such that 
Xi and Xk are not adjacent (otherwise a new v-structure is created); 

R2: orient Xi — Xj into Xi — > Xj whenever there is a chain Xi — > X k — > Xj (otherwise a 
directed cycle is created); 

R3: orient Xi — Xj into Xi — > Xj whenever there are two chains Xi — X^ — > Xj and 
Xi — X[ — > Xj such that X k and X\ are not adjacent (otherwise a new v-structure or 
a directed cycle is created). 

The PC algorithm was shown to be sound and complete. 



Theorem 1 (Theorem 5.1 on p.J^lO of Spirtes et al. (2000)) Let the distribution o/V be 
faithful to a DAG Q = (V, E), and assume that we are given perfect conditional independence 
information about all pairs of variables (Xi,Xj) in V given subsets S C V\{Xj, Xj} . Then 
the output of the PC algorithm is the CPDAG that represents Q . 
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We briefly discuss the main ingredients of the proof, as these will be useful for under- 
standing our modification in Section |4| The faithfulness assumption implies that conditional 
independence in the distribution of V is equivalent to d-separation in the graph Q. The 
skeleton of Q can then be determined as follows: Xi and Xj are adjacent in Q if and only if 
Xi and Xj are conditionally dependent given any subset S of the remaining nodes. Naively, 
one could therefore check all these conditional dependencies, which is known as the SGS 



algorithm (Spirtes et al. 2000). The PC algorithm obtains the same result with fewer 
tests, by using the following fact about DAGs: two variables Xi and Xj in a DAG Q are 
d-separated by some subset S of the remaining variables if and only if they are d-separated 
by pa(<5, Xi) or pa(£/, Xj). The PC algorithm is guaranteed to check these conditional inde- 
pendencies: at any point in the algorithm the graph C is a supergraph of the true CPDAG, 
and the algorithm checks conditional dependencies given all subsets of the adjacency sets, 
which obviously include the parent sets. 

The v-structures are determined based on Lemmas 5.1.2 and 5.1.3 of |Spirtes et al. 



(20001). The soundness and completeness of the orientation rules in Step 3 was shown in 



Meek (1995). 



3.2 Sample version 

In applications, we of course do not have perfect conditional independence information. 
Instead, we assume that we have an i.i.d. sample of size n of V = (X\, . . . , X p ). The 
conditional independence relationships have to be estimated from these data. 

For example, if the distribution of V is multivariate Gaussian, one can test conditional 
independence by testing for zero partial correlation. Let p n -,i,j\s be the partial correlation 
between Xi and Xj in V given a set S C V \ {Xi,Xj}, let p n -,i,j\s be the corresponding 
sample correlation, and let g(x) = I log (j^f) denote Fisher's z-transform. One can then 
consider 

Zn;i,j\S = 9(Pn;i,j\s) and £ n; ij|S = 9(Pn;i,j\s) 

and reject the null-hypothesis Ho(i,j\S) : ftjis = against the two-sided alternative 
HA(i,j\S) : Pij\s 7^ at significance level a if 

I4 ;i ,i|sl > in - |S| - Z)- l l 2 $>- l (l - a/2), 

where $(•) denotes the cumulative distribution function of a standard normal distribution, 
and we assume n — |S| — 3 > 0. 

A sample version of the PC algorithm can be obtained by replacing all steps where 
conditional independence decisions were taken by such statistical tests. We note that the 
parameter a is used for many tests, and plays the role of a tuning parameter, where smaller 
values of a tend to lead to sparser graphs. 



3.3 Order-dependence in the sample version 

We first consider the role of order(V) in Step 1. At each level of £, order(V) determines 
the order in which pairs of adjacent vertices (line 6 in Algorithm 3.2) and subsets S of their 
adjacency sets (line 8 in Algorithm 3.2) are considered. We assume that both of these tasks 
are performed according to the lexicographical ordering of order (V), which is the standard 
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implementation in pcalg (|Kalisch et aT} |2012[) and TETRAD IV flSpirtes et aT}|2PPP), and 



is called "PC-1" in |Spirtes et aL]pPP0| ) (Section 5.4.2.4). 

The skeleton C is updated after each edge removal. Hence, the adjacency sets typically 
change within one level of £, and this affects which conditional independencies are checked, 
as the algorithm only conditions on subsets of the adjacency sets. 

In the oracle version, we have perfect conditional independence information, and all 
orderings on the variables lead to the same output. In the sample version, however, we 
typically make mistakes in keeping or removing edges. In such cases, the resulting changes 
in the adjacency sets can lead to different outputs. This is illustrated in Example [TJ 



Example 1 Suppose that the distribution of V = {Xi, . . . , X5} is faithful to the DAG 



in Figure 2(a) This DAG encodes the following conditional independencies with minimal 
separating sets: X\ JL A2 and X2 _LL A4HA1, A3} . 

Suppose that we have an i.i.d. sample of{Xi, . . . , A5}, and that the following conditional 
independencies with minimal separating sets are judged to hold at some level a: X\ JL X2, 
X<i JL A4HA1, As} ; and X^ _LL A4HA1, A5}. Thus, the first two are correct, while the third 
is false. 

We now apply the PC algorithm with two different orderings: order\{V) = (Ai, A2, A3, 
X4,X§) and order2(V) = (Ai, A3, X2, X4, A5). The resulting skeletons are shown in Fig- 
2(b) and \2(c) , respectively. We see that the skeletons are different, and that both are 



ures 



incorrect as the edge A3 — A4 is missing. The skeleton for order2(V) contains an additional 
error, as there is an additional edge X2 — A4. 



We now go through Algorithm 3.2 to see what happened. We start with a complete undi- 
rected graph on V. When £ = P, variables are tested for marginal independence, and the 
algorithm correctly removes the edge between X\ and X2 ■ No other conditional independen- 
cies are found when £ = or £ = 1. When £ = 2, there are two pairs of vertices that are 
thought to be conditionally independent given a subset of size 2, namely the pairs (A2, A4) 
and (A3, A4). 

In orderi(V), the pair (A2,A4) is considered first. The corresponding edge is removed, 
as X2 JL A4HA1, A3} and {X\,X^} is a subset of adj(C,X4) = {X\, X2, X3, X5}. Next, 
the edge between (Aj^Ajj) is erroneously removed, because of the wrong decision that X3 JL 
X4|{Xl,X5} and the fact that {Xi,X§} is a subset of adj(C,Xs) = {Xi, X%, X4, X5} and 
adj(C,X i ) = {X 1 ,X 3 ,X 5 }. 

In order2(V), the pair (A^AL^) is considered first, and the edge between A3 and A4 is 
erroneously removed. Next, the algorithm considers the pair (A2, A4). Since adj(C, A2) = 
{A3,A4,As} and adj(C,X4) = {Ai,A2,A5}, the separating set {Ai,A3} o/(A2,A4) is 
not a subset of one of these adjacency sets. Therefore, the edge X2 — A4 remains. In 
other words, since (A3, A4) was considered first in order2(V) , the adjacency set of A4 was 
affected and no longer contained A3, so that the algorithm "forgot" to check the conditional 
independence Ai JL A4HA1, A3}. 

The sample version of the PC algorithm is also order-dependent in Steps 2 and 3, where 
different orderings can lead to different (and sometimes incorrect) separating sets and hence 
to different v-structures and edge orientations. This is illustrated in Example [3] in the next 
Section. 
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(a) True DAG. 




(b) Skeleton returned by the 
sample version of Algorithm 
3^1 with orden(V) and Al- 
gorithm |4.2| with any order- 
ing. 




(c) Skeleton returned by the 
sample version of Algorithm 



3.2 with order 2 (V). 



Figure 2: Graphs corresponding to Examples [T] and [2] 



4. The PC-stable algorithm 
4.1 Oracle version 

We now propose a modification of the PC algorithm, called PC-stable. The pseudocode is 



given in Algorithm 4.1 where the only change is in Step 1 of the algorithm. The pseudocode 



of the modified Step 1 is given in Algorithm 4.2 



The main difference between Algorithms 3.2 and |4.2| is given by the for- loop on lines 



6-8 in the latter one, which computes and stores the adjacency sets a(Xi) of all variables 
after each new size £ of the conditioning sets. These stored adjacency sets a(Xj) are used 
whenever we search for conditioning sets of this given size t. Consequently, an edge deletion 
on line 13 does no longer affect which conditional independencies are checked for other pairs 
of variables at this level of £, making the algorithm order-independent in the sample version 
(see Theorem [3]), while remaining sound and complete in the oracle version (see Theorem 
|2l). 



In other words, at each level of £, Algorithm 4.2 records which edges should be removed, 
but for the purpose of the adjacency sets it removes these edges only when it goes to the 
next value of £. In this sense our approach is opposite to the one of Spirtes et al. ( 2000 ) 
(Section 5.4.2.4), where edges are removed as early as possible. 



Theorem 2 Let the distribution o/V be faithful to a DAG Q = (V,E), and assume that we 
are given perfect conditional independence information about all pairs of variables (X{,Xj) 
in V given subsets S C V \ {Xi,Xj}. Then the output of the PC-stable algorithm is the 
CPDAG that represents Q . 



The proof of Theorem [2] is analogous to the proof of Theorem [T] for the original PC 



algorithm, as discussed in Section 3.1 
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Algorithm 4.1 The PC-stable algorithm (oracle version) 



Require: Conditional independence information among all variables in V, and an ordering 
order(V) on the variables 



Find the skeleton and separation sets using Algorithm 4.2 



Orient unshielded triples in the skeleton based on the separation sets; 

Orient as many of the remaining undirected edges as possible by repeated application 

of rules R1-R3 (see text); 

return Output graph (C) and separation sets (sepset). 



Algorithm 4.2 Step 1 of the PC-stable algorithm (oracle version) 

Require: Conditional independence information among all variables in V, and an ordering 



order(V) on the variables 
1: Form the complete undirected graph C on the vertex set V 
2: Let I = -1; 
3: repeat 
4: Let £ = 1 + 1; 
5: repeat 

6: for all vertices Xj in C do 
7: Let a(Xj) = adj(C,Xj) 

8: end for 

9: Select a (new) ordered pair of vertices (Xj,Xj) that are adjacent in C and satisfy 

|a(Xj) \ {Xj}\ > £, using order(V); 
10: repeat 

11: Choose a (new) set S C a(Xj) \ {Xj} with |S| = £, using order(V); 

12: if Xi and Xj are conditionally independent given S then 

13: Delete edge Xi — Xj from C; 

14: Let sepset (Xj , Xj) = sepset (Xj, Xj) = S; 

15: end if 

16: until Xj and Xj are no longer adjacent in C or all S C a(Xj) \ {Xj} with |S| = £ 
have been considered 

17: until all ordered pairs of adjacent vertices (Xj, Xj) in C with |a(Xj) \ {Xj}| > I have 
been considered 

18: until all pairs of adjacent vertices (Xj,Xj) in C satisfy |a(Xj) \ {Xj}| < £ 
19: return C, sepset. 



4.2 Order-independent skeletons in the sample version 

The sample version of the PC-stable algorithm can be obtained as before, by replacing con- 
ditional independence queries by conditional independence tests (see Section 3.2). Theorem 
[3] shows that the sample version of PC-stable yields order-independent skeletons. 

Theorem 3 The skeleton resulting from the sample version of the PC-stable algorithm is 
order-independent. 

Proof We consider the removal or retention of an arbitrary edge Xj — Xj at some level £. 
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The ordering of the variables determines the order in which the edges (line 9 of Algorithm 
4.2) and the subsets S of a(Aj) and a{Xj) (line 11 of Algorithm 4.2) are considered. By 



construction, however, the order in which edges are considered does not affect the sets a(Aj) 
and a(Xj). 

If there is at least one subset S of a{Xi) or a(Xj) such that Xi _LL Aj|S, then any 
ordering of the variables will find a separating set for Xi and Xj (but different orderings 
may lead to different separating sets). Conversely, if there is no subset S' of a(Aj) or a(Xj) 
such that Xi JL Xj\S', then no ordering will find a separating set. 

Hence, any ordering of the variables leads to the same edge deletions, and therefore to 
the same skeleton. ■ 



We now illustrate the PC-stable algorithm on the setting of Example 1. 



Example 2 We consider the sample version of the PC-stable algorithm, using as input 
the same sample conditional independencies and variable orderings as in Example^ The 
algorithm now outputs the skeleton shown in Figure \2(b)\ for both orderings. 

We again go through the algorithm to see what happens. We start with a complete 
undirected graph on V. The only conditional independence found when t = or i = 1 is 
X\ JL X2, and the corresponding edge is removed. When t = 2, the algorithm first computes 
the new adjacency sets: a{X\) = a(A 2 ) = {A3, A4, A5} and a(Xi) = V\{ Xi} for i = 3, 4, 5. 
There are two pairs of variables that are thought to be conditionally independent given a 
subset of size 2, namely (A2, A4) and (A3, A4). Since the sets a(Aj) are not updated after 
edge removals, it does not matter in which order we consider these pairs. Both orderings 
lead to the removal of both edges, as the separating set {Ai, A3} for (A2, A4) is contained in 
a(A4), and the separating set {Ai, A5} for (A3, A4) is contained in both a(A3) and a(A4). 

The proof of Theorem [3] also shows that the separating sets resulting from the oracle 
and sample versions of the PC-stable algorithm are generally still order-dependent. In the 
sample version this can lead to order-dependence edge-orientations. We illustrate this in 
Example [3j 



Example 3 Suppose that the distribution o/V = {Ai, A2, A3, A4} is faithful to the DAG 



in Figure 3(a) This DAG encodes the following conditional independencies with minimal 
separating sets: Ai JL A 3 |A 2 , A 2 JL A 4 |A 3 , Ai JL A 4 |A 2 , and Ai JL A 4 |A 3 . 

We first consider the oracle PC and PC-stable algorithms (both give identical results) 
with two different orderings on the variables: order^(V) = (X±, A 2 , A3, A4) and order^i^V) = 
(A3, A 2 , Ai, A4). For order^(V), we obtain sepset(Xi, A4) = {A 2 } ; while for order^(V) we 
get sepset( X±, A4) = {A3}. Thus, the information in the separating sets is order- dependent. 
However, since both versions of the separating sets are correct, they both lead to the same 
edge orientations: neither one of the unshielded triples (Ai,A 2 ,A3) and (A 2 ,A3,A4) is 
oriented as a v-structure, so that none of the edges can be oriented. 

Now suppose that we have an i.i.d. sample of (X\, A 2 , A3, A4). Suppose that at some 
level a, all true conditional independencies are judged to hold, and A 2 JL A4IA1 is thought to 
hold by mistake. Now with orders(V) , we obtain the incorrect separating set sepset( A 2 , A4) = 
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{-X4}, while with order 4 (V) we obtain the correct separating set sepset(X 2 , X 4 ) = {X%}. So 
with order^iy), we incorrectly orient the unshielded triple (X 2 , X3, X4) as a v-structure (see 
Figure 3(c)), while order 4 (V) correctly leaves the edges undirected (see Figure \3(b)\ ). This il- 
lustrates that order- dependent separating sets may lead to order- dependent edge orientations 
in the sample versions of the PC and PC-stable algorithms. 




X 2 



x 3 




x 2 



x 3 




x 2 



x 3 



X, 



(a) True DAG. 




X 4 




X 4 



(b) Output of the 
oracle PC (-stable) 
algorithm with any 
ordering, and of 
the sample P (In- 
stable) algorithm 
with order4(V). 




x 4 



(c) Output of the 
sample PC(-stable) 
algorithm with 
order3(V). 



Figure 3: Graphs corresponding to Example [3j 



4.3 High-dimensional consistency 

The original PC algorithm has been shown to be consistent for certain sparse high-dimensio- 



nal graphs. In particular, Kalisch and Biihlmann (2007) proved consistency for multivariate 



Gaussian distributions, when using the partial correlation tests discussed in Section 3.2 



More recently, Harris and Drton (2012) showed consistency for the broader class of Gaussian 



copulas when using rank correlations, under slightly different conditions. 

These high-dimensional consistency results allow the DAG Q and the number of observed 



variables p in V grow as a function of the sample size, so that p 



Pi, 



V 



V n 



(X n> i, . . . ,X n>Pn ) and Q = Q n . The corresponding CPDAGs that represent Q n are denoted 
by C n , and the estimated CPDAGs using tuning parameter a n are denoted by C n (a n ). Then 
the consistency results say that, under some conditions, there exists a sequence a n such that 
P(C n (a n ) = C n ) — > 1 as n — > 00. 

These consistency results rely on the fact that the PC algorithm only performs con- 
ditional independence tests between pairs of variables given subsets S of size less than or 
equal to the degree of the graph (when no errors are made). Our modification still obeys 



this property, and therefore the consistency results of Kalisch and Biihlmann (2007) and 



Harris and Drton (2012) remain valid for the PC-stable algorithm, under exactly the same 



conditions as for the original PC algorithm. 



5. Simulations 

We compared the estimation performance and computing time of the PC and PC-stable 
algorithms using simulated data. 
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5.1 Simulation set-up 

We used the following procedure to generate a random DAG with a given number of vertices 
p and expected neighborhood size E(N). First, we generated a random adjacency matrix 
A with independent realizations of Bernoulli(-E(A r )/(p — 1)) random variables in the lower 
triangle of the matrix and zeroes in the remaining entries. Next, we replaced the ones in A 
by independent realizations of a UniformQO.l, 1]) random variable. A nonzero entry A« can 
be interpreted as an edge from Xj to Xi with "strength" Ay , in the sense that X\ , . . . , X p 
can be generated as follows: X\ = e\ and Xj = Y^i~i Ai r X r + €j for i = 2, . . . ,p, where 
ei, . . . , e p are mutually independent A/"(0, 1) random variables. The variables X\, . . . , X p 
then have a multivariate Gaussian distribution with mean zero and covariance matrix S = 
(1 — — A)~ T , where 1 is the p x p identity matrix. 

We used a low-dimensional and a high-dimensional setting. For the low-dimensional 
setting we generated 250 random DAGs with p = 50 and E(N) = 2, and for each DAG we 
generated an i.i.d. sample of size n = 1000. For the high-dimensional setting we generated 
250 random DAGs with p = 1000 and E(N) = 2, and for each DAG we generated an i.i.d. 
sample of size n = 50. 

We estimated each graph for 50 random orderings of the variables, using the sample ver- 
sions of the PC and PC-stable algorithms. We used the partial correlation tests discussed in 



Section [3^2] at level a, where we took a G {0.000625,0.00125,0.0025,0.005,0.01,0.02,0.04} 
for the low-dimensional setting, and a G {0.00125,0.0025,0.005,0.01,0.02,0.04,0.08} for 
the high-dimensional setting. Thus, for each randomly generated graph, we obtained 50 
estimated CPDAGs from each algorithm, for each value of a. 

All simulations were performed on an AMD Opteron(tm) Processor 6174 with 128 GB 
on Linux using R 2.15.1. 



5.2 Estimation performance 

For the low-dimensional setting, the two algorithms were basically indistinguishable in terms 
of estimation performance and computing time. For the high-dimensional setting, however, 
the performance of the algorithms was rather different. We therefore only discuss the high- 
dimensional results. 

Figure [4] shows the estimation performance of the algorithms. The top- left panel shows 
that the PC-stable algorithm returns graphs with fewer edges than the PC algorithm, for 
all values of a. This is related to the fact that the PC-stable algorithm tends to perform 
more tests than the PC algorithm, see also Table [T] 

The other panels show that the PC-stable algorithm has significantly better performance 
than the PC algorithm in terms of the number of estimation errors in the skeleton (top-right 
panel), the True Discovery Rate (TDR, bottom-left panel), and the Structural Hamming 
Distance (SHD, bottom-right panel), for all values of a. The TDR is defined as the pro- 
portion of edges in the estimated skeleton that are also present in the true skeleton. The 
SHD counts the minimum number of edge insertions, deletions, and flips that are needed 
in order to transform the estimated graph into the true one. 

Figure [5] shows more detailed results for the estimated skeletons for one of the 250 graphs 
(randomly chosen), for four different values of a. We see that for each value of a shown, 
the original PC algorithm yielded a certain number of stable edges that were present for all 
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DC 
Q 




Figure 4: Estimation performance of the PC algorithm (triangles; dashed line) and the PC- 
stable algorithm (circles; solid line), for different values of a (x-axis displayed in 
log 2 scale) in the high-dimensional simulation setting with p = 1000 and n = 50. 
The results are shown as averages plus or minus one standard deviation, computed 
over 250 randomly generated graphs and 50 random variable orderings per graph. 
(We note that PC-stable is order-independent with respect to the measures in the 
first three panels, but order-dependent with respect to the SHD, as this measure 
relies on the edge orientations.) 



25 variable orderings, but also a large number of extra edges that pop in or out for different 
orderings. The PC-stable algorithm yielded far fewer edges (shown in red), and roughly 
captured the edges that were stable among the different variable orderings for the original 
PC algorithm. 
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Figure 5: Estimated edges with the PC algorithm (black) for 25 random orderings on the 
variables, as well as with the PC-stable algorithm (red), for a random graph from 
the high-dimensional setting with p = 1000 and n = 50 and four different values 
for q. (We used only 25 of the 50 random orderings in order to properly display 
the results.) The edges along the x-axes are ordered according to their presence 
in the 25 random orderings using the original PC algorithm. Edges that did not 
occur for any of the orderings were omitted. 



5.3 Number of tests and computing time 



One can easily deduce that Step 1 of the PC and PC-stable algorithms perform the same 
number of tests for I = 0, because the adjacency sets do not play a role at this stage. 
Moreover, for t = 1 the PC-stable algorithm performs at least as many tests as the PC 
algorithm, since the adjacency sets a(Xj) (see Algorithm 4.2) are always supersets of the 
adjacency sets adj(Aj) (see Algorithm 3.2). For larger values of £, however, it is difficult to 
analyze the number of tests analytically. 
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Table [T] therefore shows the average number of tests that were performed by Step 1 
of the two algorithms, separated by size of the conditioning set, where we considered the 
high- dimensional setting with a = 0.08 since this was most computationally intensive. As 
expected the number of marginal correlation tests was identical for both algorithms. For 
I = 1 and i = 2, the PC-stable algorithm performed about twice as many tests as the PC 
algorithm, amounting to about 5.7 x 10 5 and 5.2 x 10 additional tests, respectively. For 
larger values of £, the PC-stable algorithm performed fewer tests than the PC algorithm, 
since the additional tests for t = 1 and £ = 2 lead to a sparser skeleton. However, since the 
PC algorithm also performed relatively few tests for larger values of £, the absolute difference 
in the number of tests for large t is rather small. In total, the PC-stable algorithm performed 
about 6.1 x 10 5 more tests than the original PC algorithm. 





PC algorithm 


PC-stable algorithm 


e = o 


5.40 x 10 5 (2.90 x 10 2 ) 


5.40 x 10 5 (2.90 x 10 2 ) 


e = i 


4.52 x 10 5 (6.98 x 10 3 ) 


1.02 x 10 6 (1.45 x 10 4 ) 


£ = 2 


6.70 x 10 4 (2.75 x 10 3 ) 


1.19 x 10 5 (5.52 x 10 3 ) 


£ = 3 


6.58 x 10 3 (4.83 x 10 2 ) 


1.76 x 10 3 (2.62 x 10 2 ) 


£ = A 


1.08 x 10 3 (1.54 x 10 2 ) 


2.07 x 10 2 (6.37 x 10 1 ) 


£ = 5 


9.82 x 10 1 (3.72 x 10 1 ) 


1.32 x 10 1 (1.35 x 10 1 ) 


£ = 6 


3.23 x 10° (5.53 x 10°) 


0.36 x 10° (1.55 x 10°) 


£ = 7 


0.25 x 10" 1 (0.45 x 10°) 




Total 


1.07 x 10 6 (9.32 x 10 3 ) 


1.68 x 10 6 (1.90 x 10 4 ) 



Table 1: Number of tests performed by Step 1 of the PC and PC-stable algorithms for 
each size of the conditioning sets £, in the high-dimensional setting with p = 1000, 
n = 50 and a = 0.08. The results are shown as averages (standard deviations) 
over 250 random graphs and 50 random variable orderings per graph. 



Table[2]shows the average runtime of the PC and PC-stable algorithms. We see that the 
PC-stable algorithm is somewhat slower than the PC algorithm for all values of a, which 
can be explained by the fact that the PC-stable algorithm tends to perform a larger number 
of tests (cf . Table [T]) . 



6. Yeast gene expression data 

We also compared the PC and PC-stable algorithms on the yeast gene expression data 
(Hughes et al. 2000) that were already briefly discussed in Section [TJ In Section 6.1 



we 



consider estimation of the skeleton of the CPDAG, and in Section [672] we consider estimation 
of bounds on causal effects. 

We used the same pre-processed data as in Maathuis et al. ( |2010 ). These contain: (1) 
expression measurements of 5361 genes for 63 wild-type cultures (observational data of size 
63 x 5361), and (2) expression measurements of the same 5361 genes for 234 single-gene 
deletion mutant strains (interventional data of size 234 x 5361). 
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PC algorithm 


PC-stable algorithm 


a = 0.00125 


94. 76 {6.00) 


100.54 (3.84) 


„ n not: 
a = 0.025 


C\A C\R /O £1 ^ 

94.96 (2.51J 


100. oO (z. /UJ 


a = 0.05 


95.50 (3.14) 


101.25 (3.33) 


a = 0.01 


98.53 (3.60) 


104.83 (3.47) 


a = 0.02 


103.07 (3.44) 


112.21 (3.71) 


a = 0.04 


118.57 (4.54) 


138.35 (5.24) 


a = 0.08 


182.76 (7.42) 


259.28 (11.04) 



Table 2: Run time in seconds of the PC and PC-stable algorithms for the high-dimensional 
setting with p = 1000 and n = 50. The results are shown as averages (standard 
deviations) over 250 random graphs and 50 random variable orderings per graph. 



6.1 Estimation of the skeleton 



We applied the PC and PC-stable algorithms to the observational data. We saw in Section 
[T] that the PC algorithm yielded estimated skeletons that were highly dependent on the 
variable ordering, as shown in black in Figure [6] for the 26 variable orderings (the original 
ordering and 25 random permutations of the variables). The PC-stable algorithm does not 
suffer from this order-dependence, and the estimated skeleton is shown in the figure in red. 
We see that the PC-stable algorithm yielded a far sparser skeleton (2086 edges for PC-stable 
versus 5015-5159 edges for the PC algorithm, depending on the variable ordering). Just as 
in the simulations in Section [5] the PC-stable algorithm roughly captured the edges that 
were stable among the different variable orderings for the original PC algorithm. 

To make "captured the edges that were stable" somewhat more precise, we defined the 
following two sets: Set 1 contained all edges (directed edges) that were present for all 26 
variable orderings using the original PC algorithm, and Set 2 contained all edges (directed 
edges) that were present for at least 90% of the 26 variable orderings using the original PC 
algorithm. Set 1 contained 1478 edges (7 directed edges), while Set 2 contained 1700 edges 
(20 directed edges). 

Table [3] shows how well the PC and PC-stable algorithms could find these stable edges 
in terms of true positives and false positives over the different variable orderings. We see 
that the number of true positives is about the same for both algorithms, while the PC-stable 
algorithm has far fewer false positives. 



6.2 Estimation of causal effects 



We used the interventional data as the gold standard for estimating the total causal effects 
of the 234 deleted genes on the remaining 5361 (see Maathuis et al. (2010)). We then 
defined the top 10% of the largest effects in absolute value as the target set of effects, and 



we evaluated how well IDA (Maathuis et al. 
observational data. 



2009, 2010) identified these effects from the 
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5000 10000 15000 

edges 



(a) As Figure |l(a)| plus the edges occur- 
ring in the estimated skeleton using the 
PC-stable algorithm (red, shown as per- 
mutation 27). 




5000 10000 15000 

edges 



(b) The step function shows the proportion 
of the 26 permutations in which the edges 
were present for the original PC algorithm, 
where the edges are ordered as in Figurc [6(a)| 
The red bars show the edges present in the 
estimated skeleton using the PC-stable algo- 
rithm. 



Figure 6: Analysis of estimated skeletons of the CPDAGs for the yeast gene expression 
data (iHughes et all |2000h , using the PC and PC-stable algorithms. The PC- 



stable algorithm yields an order-independent skeleton that roughly captures the 
edges that were stable among the different variable orderings for the original PC 
algorithm. 





Edges 


Directed edges 


PC algorithm 


PC-stable algorithm 


PC algorithm 


PC-stable algorithm 


Set 1 


TP 


1478 (0) 


1478 (0) 


7(0) 


7(0) 


FP 


3606 (38) 


607 (0) 


4786 (47) 


1444 (2) 


Set 2 


TP 


1688 (3) 


1688 (0) 


19 (1) 


20 (0) 


FP 


3396 (39) 


397 (0) 


4774 (47) 


1431 (2) 



Table 3: Number of true positives and false positives for edges and directed edges for the 
PC and PC-stable algorithms, using Sets 1 and 2 as target set. The results are 
shown as averages (standard deviations) over the 26 variable orderings. 



We saw in Figure 1(b) that IDA with the original PC algorithm is highly order-dependent. 
Figure [7] shows the same analysis with PC-stable, using tuning parameters a = 0.01 and 
a = 0.05. We see that using PC-stable generally yielded better and more stable results 
than the original PC algorithm for both values of a, where a = 0.05 seems to give best 
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results for very small numbers of false positives, while a 
numbers of false positives. Note that the curves for a - 



: 0.01 gives best results for larger 
0.01 are slightly worse than the 



reference curve of Maathuis et al. (2010) towards the beginning of the curves, but this can 
be explained by the fact that the original variable ordering seems to be especially "lucky" 
for this part of the curve (cf. Figure |l(b)| ). There is still some variability in the ROC curves 
in Figure [7] due to the order-dependent orientations in the PC-stable algorithm, but this 



variability is much less prominent than in Figure 1(b) 



2400 
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2000 - 
1800 
1600 
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1000 2000 3000 

False positives 



(a) ROC curves corresponding to the 25 ran- 
dom orderings of the variables (solid black), 
where the curves are generated as in Maathuis 
et al.|(|2010l) but using PC-stable with a = 0.01. 



The ROC curves from Maathuis et al. (20101 



(dashed blue) and the one for random guessing 
(dashed-dotted red) are shown as references. 




1000 2000 3000 4000 

False positives 



(b) ROC curves corresponding to the 25 ran- 
dom orderings of the variables (solid black), 
where the curves are generated as in |Maathuis| 
et al.|l2010b but using PC-stable with a = 0.05. 



The ROC curves from Maathuis et al. (2010) 



(dashed blue) and the one for random guessing 
(dashed-dotted red) are shown as references. 



Figure 7: Analysis of yeast gene expression data (Hughes et al. 2000), for 25 random per- 



mutations of the variables, using the PC-stable algorithm. The resulting causal 
rankings are no longer highly order-dependent. 



7. Discussion 

Due to its computational efficiency, the PC algorithm is widely used for estimating CPDAGs 
in sparse high- dimensional settings. We found, however, that the order-dependence of the 
PC algorithm is especially problematic in these settings. We proposed a simple modification 
of the PC algorithm, called PC-stable, that removes a large part of this order-dependence 
and yields an order-independent skeleton. Existing high-dimensional consistency results 
for the PC algorithm remain valid for the PC-stable algorithm under the same conditions. 
Moreover, we showed that the PC-stable algorithm yields improved and more stable esti- 
mation in sparse high-dimensional settings, both for simulated data and a real data set. All 



software is implemented in the R-package pcalg (Kalisch et al. 2012) 
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Compared to the adaptation of Cano et al. (2008), the PC-stable algorithm is much 
simpler and preserves existing soundness, completeness, and high-dimensional consistency 
results. Moreover, PC-stable can be easily used together with other adaptations of the PC 
algorithm, for example hybrid versions of PC with score-based methods (Singh and Valtorta 



(1993); Spirtes and Meek (1995); van Dijk et al. (2003) or the PC* algorithm (Spirtes et al. 



2000 



Section 5.4.2.3). Moreover, the modification in PC-stable is also relevant for the FCI 
algorithm ( Spirtes et al.[ 2000) and the RFCI algorithm (Colombo et al. 2012), as the first 
step of these algorithms is identical to Step 1 of the PC algorithm. The existing consistency 



2012) 



results of FCI and RFCI remain valid under the same conditions (Colombo et al. 
when using our modification. 

The edge orientations generally remain order-dependent in the PC-stable algorithm. A 



possible solution for this problem would be to use stability selection (Meinshausen and 



Biihlmann 2010) to find the most stable orientations, where the variable ordering should 



be permuted in each stability run. In fact, Stekhoven et al. (2012) already proposed a com- 
bination of IDA and stability selection which lead to improved performance when compared 
to plain IDA, but they used the original PC algorithm and did not permute the variable 
ordering. Our findings in this paper suggest that incorporating PC-stable into this ap- 
proach and permuting the variable ordering in each stability run is likely to lead to further 
improvements. 
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